skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Nandi, Arnab"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Video data can be slow to process due to the size of video streams and the computational complexity needed to decode, transform, and encode them. These challenges are particularly significant in interactive applications, such as quickly generating compilation videos from a user search. We look at optimizing access to source video segments in multimedia systems where multiple separately encoded copies of video sources are available, such as proxy/optimized media in conventional non-linear video editors or VOD streams in content distribution networks. Rather than selecting a single source to use (e.g., "use the lowest-bitrate 720p source"), we specify a minimum visual quality (e.g., "use any frames with VMAF ≥ 85"). This quality constraint and the needed segment bounds are used to find the lowest-latency operations to decode a segment from multiple available sources with diverse bitrates, resolutions, and codecs. This uses higher-quality/slower-to-decode sources if the encoding is better aligned for the specific segment bounds, which can provide faster access than using just one lower-quality source. We provide a general solution to this Quality-Aware Multi-Source Selection problem with optimal computational complexity. We create a dataset using adaptive-bitrate streaming Video on Demand sources from YouTube's CDN. We evaluate our algorithm on simple segment decoding as well as embedded into a larger editing system---a declarative video editor. Our evaluation shows up to 23% lower latency access, depending on segment length, at identical visual quality levels. 
    more » « less
    Free, publicly-accessible full text available March 31, 2026
  2. Querying video data has become increasingly popular and useful. Video queries can be complex, ranging from retrieval tasks (“find me the top videos that have … ”), to analytics (“how many videos contained object X per day?”), to excerpting tasks (“highlight and zoom into scenes with object X near object Y”), or combinations thereof. Results for video queries are still typically shown as either relational data or a primitive collection of clickable thumbnails on a web page. Presenting query results in this form is an impedance mismatch with the video medium: they are cumbersome to skim through and are in a different modality and information density compared to the source data. We describe V2V, a system to efficiently synthesize video results for video queries. V2V returns a fully-edited video, allowing the user to consume results in the same manner as the source videos. A key challenge is that synthesizing video results from a collection of videos is computationally intensive, especially within interactive query response times. To address this, V2V features a grammar to express video transformations in a declarative manner and a heuristic optimizer that improves the efficiency of V2V processing in a manner similar to how databases execute relational queries. Experiments show that our V2V optimizer enables video synthesis to run 3x faster. 
    more » « less
  3. IntroductionAdvancements in machine learning (ML) algorithms that make predictions from data without being explicitly programmed and the increased computational speeds of graphics processing units (GPUs) over the last decade have led to remarkable progress in the capabilities of ML. In many fields, including agriculture, this progress has outpaced the availability of sufficiently diverse and high-quality datasets, which now serve as a limiting factor. While many agricultural use cases appear feasible with current compute resources and ML algorithms, the lack of reusable hardware and software components, referred to as cyberinfrastructure (CI), for collecting, transmitting, cleaning, labeling, and training datasets is a major hindrance toward developing solutions to address agricultural use cases. This study focuses on addressing these challenges by exploring the collection, processing, and training of ML models using a multimodal dataset and providing a vision for agriculture-focused CI to accelerate innovation in the field. MethodsData were collected during the 2023 growing season from three agricultural research locations across Ohio. The dataset includes 1 terabyte (TB) of multimodal data, comprising Unmanned Aerial System (UAS) imagery (RGB and multispectral), as well as soil and weather sensor data. The two primary crops studied were corn and soybean, which are the state's most widely cultivated crops. The data collected and processed from this study were used to train ML models to make predictions of crop growth stage, soil moisture, and final yield. ResultsThe exercise of processing this dataset resulted in four CI components that can be used to provide higher accuracy predictions in the agricultural domain. These components included (1) a UAS imagery pipeline that reduced processing time and improved image quality over standard methods, (2) a tabular data pipeline that aggregated data from multiple sources and temporal resolutions and aligned it with a common temporal resolution, (3) an approach to adapting the model architecture for a vision transformer (ViT) that incorporates agricultural domain expertise, and (4) a data visualization prototype that was used to identify outliers and improve trust in the data. DiscussionFurther work will be aimed at maturing the CI components and implementing them on high performance computing (HPC). There are open questions as to how CI components like these can best be leveraged to serve the needs of the agricultural community to accelerate the development of ML applications in agriculture. 
    more » « less
    Free, publicly-accessible full text available January 23, 2026
  4. The effective reporting of climate hazards, such as flash floods, hurricanes, and earthquakes, is critical. To quickly and correctly assess the situation and deploy resou rces, emergency services often rely on citizen reports that must be timely, comprehensive, and accurate. The pervasive availability and use of smartphone cameras allow the transmission of dynamic incident information from citizens in near-real-time. While high-quality reporting is beneficial, generating such reports can place an additional burden on citizens who are already suffering from the stress of a climate-related disaster. Furthermore, reporting methods are often challenging to use, due to their length and complexity. In this paper, we explore reducing the friction of climate hazard reporting by automating parts of the form-filling process. By building on existing computer vision and natural language models, we demonstrate the automated generation of a full-form hazard impact assessment report from a single photograph. Our proposed data pipeline can be integrated with existing systems and used with geospatial data solutions, such as flood hazard maps. 
    more » « less
  5. null (Ed.)
    Unlike traditional object stores, Augmented Reality (AR) query workloads possess several unique characteristics, such as spatial and visual information. Such workloads are often keyed on a variety of attributes simultaneously, such as device orientation and position, the scene in view, and spatial anchors. The natural mode of user-interaction in these devices triggers queries implicitly based on the field in the user's view at any instant, generating data queries in excess of the device frame rate. Ensuring a smooth user experience in such a scenario requires a systemic solution exploiting the unique characteristics of the AR workloads. For exploration in such contexts, we are presented with a view-maintenance or cache-prefetching problem; how do we download the smallest subset from the server to the mixed reality device such that latency and device space constraints are met? We present a novel data platform - DreamStore, that considers AR queries as first-class queries, and view-maintenance and large-scale analytics infrastructure around this design choice. Through performance experiments on large-scale and query-intensive AR workloads on DreamStore, we show the advantages and the capabilities of our proposed platform. 
    more » « less
  6. null (Ed.)
    Along with textual content, visual features play an essential role in the semantics of visually rich documents. Information extraction (IE) tasks perform poorly on these documents if these visual cues are not taken into account. In this paper, we present Artemis - a visually aware, machine-learning-based IE method for heterogeneous visually rich documents. Artemis represents a visual span in a document by jointly encoding its visual and textual context for IE tasks. Our main contribution is two-fold. First, we develop a deep-learning model that identifies the local context boundary of a visual span with minimal human-labeling. Second, we describe a deep neural network that encodes the multimodal context of a visual span into a fixed-length vector by taking its textual and layout-specific features into account. It identifies the visual span(s) containing a named entity by leveraging this learned representation followed by an inference task. We evaluate Artemis on four heterogeneous datasets from different domains over a suite of information extraction tasks. Results show that it outperforms state-of-the-art text-based methods by up to 17 points in F1-score. 
    more » « less
  7. Classifying heterogeneous visually rich documents is a challenging task. Difficulty of this task increases even more if the maximum allowed inference turnaround time is constrained by a threshold. The increased overhead in inference cost, compared to the limited gain in classification capabilities make current multi-scale approaches infeasible in such scenarios. There are two major contributions of this work. First, we propose a spatial pyramid model to extract highly discriminative multi-scale feature descriptors from a visually rich document by leveraging the inherent hierarchy of its layout. Second, we propose a deterministic routing scheme for accelerating end-to-end inference by utilizing the spatial pyramid model. A depth-wise separable multi-column convolutional network is developed to enable our method. We evaluated the proposed approach on four publicly available, benchmark datasets of visually rich documents. Results suggest that our proposed approach demonstrates robust performance compared to the state-of-the-art methods in both classification accuracy and total inference turnaround. 
    more » « less
  8. Physical and digital documents often contain visually rich information. With such information, there is no strict order- ing or positioning in the document where the data values must appear. Along with textual cues, these documents often also rely on salient visual features to define distinct semantic boundaries and augment the information they disseminate. When performing information extraction (IE), traditional techniques fall short, as they use a text-only representation and do not consider the visual cues inherent to the layout of these documents. We propose VS2, a generalized approach for information extraction from heterogeneous visually rich documents. There are two major contributions of this work. First, we propose a robust segmentation algorithm that de- composes a visually rich document into a bag of visually iso- lated but semantically coherent areas, called logical blocks. Document type agnostic low-level visual and semantic fea- tures are used in this process. Our second contribution is a distantly supervised search-and-select method for identify- ing the named entities within these documents by utilizing the context boundaries defined by these logical blocks. Ex- perimental results on three heterogeneous datasets suggest that the proposed approach significantly outperforms its text-only counterparts on all datasets. Comparing it against the state-of-the-art methods also reveal that VS2 performs comparably or better on all datasets. 
    more » « less
  9. Sampling is often used to reduce query latency for interactive big data analytics. The established parallel data processing paradigm relies on function shipping, where a coordinator dispatches queries to worker nodes and then collects the results. The commoditization of high-performance networking makes data shipping possible, where the coordinator directly reads data in the workers’ memory using RDMA while workers process other queries. In this work, we explore when to use function shipping or data shipping for interactive query processing with sampling. Whether function shipping or data shipping should be preferred depends on the amount of data transferred, the current CPU utilization and the sampling method. The results show that data shipping is up to 6.5× faster when performing clustered sampling with heavily-utilized workers. 
    more » « less